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In this article we describe a method for carrying out Bayesian es- 
timation for the double Pareto lognormal (dPIN) distribution which 
has been proposed as a model for heavy-tailed phenomena. We ap- 
ply our approach to estimate the dPIN/M/l and M/dPlN/1 queue- 
ing systems. These systems cannot be analyzed using standard tech- 
niques due to the fact that the dPIN distribution does not possess 
a Laplace transform in closed form. This difficulty is overcome using 
some recent approximations for the Laplace transform of the interar- 
rival distribution for the Pareto /M /l system. Our procedure is illus- 
trated with applications in internet traffic analysis and risk theory. 

1. Introduction. Heavy-tailed distributions have been used to model a 
variety of phenomena in areas such as economics, finance, physical and bi- 
ological problems; see Adler, Feldman and Taqqu (f999). In particular, a 
number of variables in teletraffic engineering, such as file sizes, packet ar- 
rivals, etc., have been shown to possess heavy-tailed distributions; this can 
be found, for example, in Paxson and Floyd (1995). Also, in an actuarial 
context, insurance claim sizes can often be very large and in such cases, may 
be modeled as long tailed; see, for example, Embrechts, Kluppelberg and 
Mikosch (1997). For a detailed review of heavy-tailed distributions, we refer 
the reader to Sigman (1999). 

The Pareto distribution has often been applied to model the heavy-tail be- 
havior of teletraffic variables [Resnick (1997)] and insurance claims [Philbrick 
(1985)]. In particular, in Ramirez, Lillo and Wiper (2008) a mixture of k 
Pareto distributions (k-Par) is used to model ethernet packets interarrival 
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times. However, although the Pareto distribution often models the tails of a 
distribution well, it is unimodal and decreasing, which means that it will not 
model the body of the distribution correctly in many modeling situations as 
is shown in some of the examples in this paper. 

Reed and Jorgensen (2004) recently introduced the double Pareto log- 
normal (dPIN) distribution as a versatile model for heavy-tailed data and 
considered various frequentist approaches to inference for this distribution. 
They did not recommend the method of moments as an estimation method, 
and observed that the EM algorithm sometimes encounters convergence 
problems. In this work we focus on the Bayesian approach, which may be 
preferred for problems where the interest is not only in inference but also 
in prediction; see, for example, Robert (2001). The first objective of this 
paper is thus to develop an algorithm to implement Bayesian inference for 
the dPIN distribution. 

The study of congestion in teletraffic systems and of ruin problems in 
insurance is directly related to the analysis of queueing systems, where the 
arrival or service process are defined by a heavy-tailed distribution. In this 
paper we consider the dPIN/M/l and M/dPlN/\ queues, which, to our 
knowledge, have not been considered before in the literature. 

The usual moment generating function approach to obtaining the equi- 
librium distribution of a queue [Gross and Harris (1998)] is difficult to im- 
plement because the dPIN distribution lacks a moment generating function 
in closed form. An alternative, which we shall apply, is based on a direct 
approximation of the nonanalytical Laplace transform using a variant of the 
transform approximation method (TAM); see Harris and Marchal (1998), 
Harris, Brill and Fischer (2000) and Shortle et al. (2004). The first ver- 
sion of the TAM, known as Uniform TAM or U-TAM, was implemented in 
Ramirez, Lillo and Wiper (2008), where estimation of the k-Par / M /l queue 
was considered. In this paper we propose a variant of the TAM based on 
both the Uniform and Geometrical TAMs. By combining this variant of the 
TAM with the Bayesian inference method for the dPIN distribution, we can 
obtain estimates of queueing properties of interest such as the probability 
of congestion. 

This paper is organized as follows. In Section 2 we review the defini- 
tion and key properties of the dPIN distribution and present an approach 
to Bayesian inference for this distribution, illustrating our procedure with 
simulated and real data. In Section 3 we examine the dPIN/M/l queue- 
ing system and show how the TAM approach can be used to approximate 
the Laplace transform of the dPIN distribution. Our results are then ap- 
plied to a real example of internet traffic arrivals. In Section 4 we study the 
Mj dPIN /l queueing system and show how the waiting time distribution of 
this system can be estimated. We then apply our results to the estimation 
of the ruin probability given real insurance claims data. Conclusions and 
possible extensions to this work are considered in Section 5. 



INFERENCE FOR DOUBLE PARETO LOGNORMAL QUEUES 3 
2. Bayesian inference for the double Pareto lognormal distribution. 

2.1. The double Pareto lognormal distribution. A random variable Y is 
said to have a Normal Laplace distribution (NL), denoted Y ~ NL(a, (3, v, r 2 ) 
if Y = Z + W, where Z ~ N(u,t 2 ), and W is a skewed Laplace distributed 
variable with density function 



f w (w\a,(3) = 

< a + j3 

independent of Z, for a,/3 > 0. The density function of Y is 



e pw , ifw<0, 



a + fi 

-e~ aw , tfw>0, 



2^ afi Jy-v 



f Y (y\a,(3,v,T ) 

a + p \ t 
x [R(ar -(y- v)/r) + R((3t + (y - u)/r)], 
where R{z) is the Mill's ratio defined by 

(2.1) R(z) = <S> c (z)/4>{z), 

where <3? c (z) = 1 — $(z) and <j>{z) and &(z) are the standard normal density 
and cumulative distributions respectively. 

A random variable, X, is said to have a dPIN distribution with parameters 
(a,/3, v, t 2 ) if X = exp(y) where Y is Normal Laplace distributed. 

The usual change of variable to the density of Y gives the density of X 
to be 

, , | R 2n aft fl\ M ( \QgX- v" 

f x {x\a,p,u,T H^g^-J^ - 

x [R(ctT - (log x-u) /t) + R((3t + (log x - v)/r)] . 

Also, Reed and Jorgensen (2004) show that the dPlN(a, (3,v,t 2 ) can be 
represented mixture as 

/x(x|a,/3,i/,T 2 ) = - /i(x|a, ^, t 2 ) H ^—f 2 (x\f3,v,T 2 ), 

a + p a + p 

where the densities 

(2.2) / l( x|a,,,r 2 ) = a,— V^/* ^og(z) - , - ar 2 \ 

(2.3) / 2 (*|p>,r 2 ) =^- 1 e-^^/ 2 g ^*) ~ " + ^ 

are, respectively, the limiting forms (as j3 — > oo and a — )• oo) of the dPlN(a, (3, 
v.t 2 ) distribution. 
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Reed and Jorgensen (2004) illustrate the form of the dPIN density func- 
tion for various different groups of parameter values. In particular, they 
show that it exhibits upper power-tail behavior in that fx( x ) ~ > kx~ a ~ l 
as x — y oo. The dPIN distribution does not possess a moment generating 
function in closed form. However, if r < a, the moment of order r exists: 

E{JT \a, (3, v,r 2 ) = e «^/2. 

[a — r)(p + r) 

Reed and Jorgensen (2004) also illustrate a procedure for frequentist infer- 
ence for the dPIN distribution using the EM algorithm and note that under 
certain conditions, this approach suffers from problems of convergence. An 
alternative procedure which has not been examined thus far is to take a 
Bayesian approach, as we do here. 

2.2. Bayesian inference. Given a random sample x= {x\, . . . ,x n ) from 
the dPlN(a,f3,v,T 2 ), the goal is to compute a posterior distribution 
/(a, /3, v, t 2 |x). For ease of notation, we define = (a, f3,v,T 2 ) in what fol- 
lows. It is easier computationally to work with the normal Laplace, hence, 
we define y = (j/i, . . . , y n ), where yi = log(xj), i = 1, . . . ,n, and compute the 
posterior density function f(6\y) using the normal Laplace likelihood. 

The definition of a normal Laplace random variable Y ~ NL(a,f3,i , ,T 2 ) 
suggests the use of a Gibbs sampler where one considers the two components 
of Y as auxiliary variables to be sampled along with so that sampling 6 
then reduces to sampling (a, (5) and (u, t 2 ) from distributions with truncated 
skewed Laplace and Gaussian likelihoods respectively. The classical EM al- 
gorithm developed in Reed and Jorgensen (2004) was based on a similar 
idea, but, as noted earlier, this can show convergence problems. 

The conditional distribution of Z\Y = y, a, j3, v, r 2 is a mixture of two 
truncated normal variables as stated in the following proposition. 

Proposition 1. The conditional distribution of Z\Y,a, f3,b>,r 2 is 
a weighted mixture of two truncated normal densities: 

( 4>{z^) 0(- zQ ) \ 

f z]y (z\y,a,(3,is,T ) = ^R{y p ) r$c( ^ ) h>y + R(Va) r$c(yQ ) ^<v ) 

/(R(y a ) + R(yp)), z G R, 

where R(-) is given in (2.1), and 

y a = ar -{y-v)/r, yp = fir + (y - u)/t, 
ot y-{v + r 2 a) g y-(u-r 2 /3) 

y = , y p = 1 

r r 
zOC = z- (v + T 2 a) ^ = z-{u-t 2 I3) 
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For a proof of Proposition 1 see Appendix A. 

Note now that we can express the skewed Laplace distribution as the 
difference of two exponential variables, that is, 

W = E\ — E2 where E\\a ~ 6{a) and E% \(3 ~ 

The following proposition specifies the conditional distribution of £7i|PF. 

Proposition 2. The distribution of Ei\W,a, (3 is a truncated exponen- 
tial with support [max{w, 0}, oo), 

(2.5) f Ellw ( ei \w,a,(3) = e . (a+p)wI 

1 w<0 \ ^ J-w>0 

for e± > max{tt>, 0}. 

The proof of Proposition 2 can be found in Appendix B. Given a sample, 
(yi, . . . , y n ) conditional on the parameters (a, /3, v, r 2 ), then we can generate 
(zi,...,z n ) from the formula in Equation (2.4). Also, we can define w = 
y - z, wi = yi - zi, w n = y n - z n and then generate ei = (e^i, e^ n ) 
from the formula in Equation (2.5) and define e 2 = ei — w. To undertake 
inference for u, and r 2 , let us suppose that we use a normal, inverse gamma 
prior distribution 

/ ^ 

(2.6) 



v\t 2 




m,- 


1 




a b 




~g( 




T 2 




2' 2 



(2.7) 

Then, from standard Bayesian theory [see, e.g., Box and Tiao (1973) 



v\t 2 ,z ~ 



A/' 



km + nz t 2 



k + n ' k + n 



1. „(a + n b+(n-l)s 2 z + (kn/(k + n))(m- zf 
z ~ G 



T 2 



where z = Y17=i Zi l n anc ^ s z = Ya=i XX z « ~~ z ) 2 / ( n — !)• Also, given gamma 
priors a ~ G(c a ,d a ), j3 ~ G{cp,dp), then 

(2.8) a\ei ~ Q(c a +n, d a + ne~i), 

(2.9) P\e 2 ~G(cp+ n,dp + ne 2 ). 

Of course, many other prior structures are possible. In particular, it might be 
assumed that v and r 2 are independent a priori, or that there is some prior 
dependence between a, f3 and v, r 2 . In the presence of real prior information, 
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the use of such alternative structures could lead to more flexible modeling. 
However, the main disadvantage is that the semi conjugate structure implied 
given the proposed prior distributions is lost and more complex MCMC 
algorithms would have to be used to undertake inference. 
Therefore, we can define the following Gibbs algorithm: 

1. Set initial values aW^W 2 ^ . 

2. For t=l,...,T 

a. For i = 1, . . . , n , 

al. Generate zf from f{z\a^, v^.r 2 ^ {&) . 

a2. Set wf ) = y t - zf . 

a3. Generate ef] from /(eiK,^* -1 ),^* -1 )). 
a4. Set eg = 

b. Generate r 2W ~ /(r 2 | Z W) . 

c. Generate i/W ~ /(z/|z(*),r 2(t) ) . 

d. Generate a® ~ /(a|ej ) . 

e. Generate ~/( / 9|4* ) ). 

In the presence of little prior information, it would appear natural to use 
a noninformative, improper prior distribution. However, it is easy to show 
that in this case, the posterior distribution is also improper. 

Proposition 3. If an improper prior distribution for a and f3 is used 
in the sense that /(a|/3) da diverges for all a > 0, (3 > or f(/3\a) dj3 
is a divergent integral for any b>0, a > 0, then the posterior distribution is 
also improper. 

The proof of Proposition 3 can be found in Appendix C. This implies that 
in order to carry out Bayesian inference, it is fundamental to use a proper 
prior distribution for a, (3. 

2.3. Illustration with simulated and real data sets. 

Example 1. As an illustration of the proposed Gibbs sampler with sim- 
ulated data, consider a sample of size 1000, generated from dPIN (0.25, 0.5, 1, 1). 
The Gibbs algorithm was run for 500,000 iterations with initial values set 
to (o) = (0.2625,0.5529, 1.1992,0.8147), the maximum likelihood estimates. 
The hyperparameters were set to m = 0, k = 4 in (2.6), a = b = 1 in (2.7) 
and c a = cp = d a = dp = 1 in (2.8)-(2.9), and from now on these are the 
values used in the rest of the examples. In order to avoid high autocorrela- 
tion, we did thinning and took one sample out of 50. Gibbs sampler code 
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Fig. 1. MCMC trace plots for Example 1. The Gibbs algorithm was applied to a sample 
of 1000 data generated from a dPIN distribution with parameters = (0.25, 0.5, 1, 1) . 



was written in Matlab and, when run on Intel Core Duo at 2.4 GHz and 
2 GB of DDR3 RAM, took approximately 19 minutes to perform 100,000 
iterations. Figure 1 illustrates the mixing properties of the algorithm. We 
found E(G\y) = (0.2578,0.4995,1.065,1.1848) close to the maximum likeli- 
hood estimates. In addition, we computed credible intervals and correlations 
in the posterior as measures of precision of the estimates. Credible intervals 
(95%) for the parameters a f3, v and r 2 were 

C a = [0.2377, 0.2906] , Cp = [0.4702, 0.6401] , 

C v = [0.7044, 1.4409], C T 2 = [0.7178, 1.7352]. 

found 

^ \ 
0.5568 

0.4534 . 

0.2362 

1 / 

Notice that, for example, the parameters f3 and v are negatively correlated 
a posteriori and a and v positively, a consequence of the definition of a 
Normal-Laplace distribution as the sum of a normal and skewed Laplace 
variables. 



With respect to the posterior correlations, we 
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Fig. 2. Histogram, fitted (dotted line) and theoretical (solid line) pdf for the simulated 
data set of Example 1. 

In Figure 2 the fitted density function, estimated for the data (in log- 
scale), and almost undistinguishable from the theoretical one, is depicted. 
The fitted curve has been computed by simple averaging over the Gibbs 
sampled values, that is, /y(y|y) has been estimated by 

t=i 

We should point out that, if instead of starting the MCMC from the 
maximum likelihood estimates, we start further from this point, the results 
are very similar to those obtained starting from the ML estimates, as long 
as the initial value of a is not very large. It has been observed that, if the 
starting value of a is large and the sample has long tails (small a, as in this 
example), then convergence can be extremely slow and the Gibbs algorithm 
often remains stuck in the tail of the distribution for a long time. Because 
of this fact we suggest starting the MCMC algorithm with small values for 
a (not necessarily the ML estimates). 

Finally, one may wonder how sensitive the method is to the hyperpa- 
rameters. We performed several analyses and our experience is that if the 
real a or (3 are not very large, then the results are not affected by the 
choice of hyperparameters. For instance, in this example, we also set m = 2, 
k = 4, a = b = 2, c a = cp = 0.5, and d a = dp = 0.2, and found E(Q\y) = 
(0.2609,0.5005,1.1833,1.0157) with credible intervals 

C a = [0.2440,0.2927], Cp = [0.4373,0.5988], 
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C v = [0.9188, 1.6039], C T 2 = [0.6210, 1.6226], 

whose lengths are very similar to that found with the first choice of hyper- 
parameters. Also, the fit to the data is almost the same as in Figure 2. The 
next example illustrates the performance of the method when a and/or /3 
are large. 

Example 2. Reed and Jorgensen (2004) state that if there is evidence in 
the analyzed data of heavy-tailed behavior just in one tail, then it is better to 
fit one of the limiting components f\ (2.2) or /2 (2.3); otherwise, a frequentist 
approach may result in the nonconvergence of the optimization algorithm. 
Here we apply the proposed Bayesian procedure to analyze simulated data 
from a dPIN (a, j3,v,r 2 ) with large a, j3. Specifically, we consider three data 
sets SI., S2. and S3., simulated from dPlN(10,0.5, 1, 1) (left heavy tail), 
dPIN (0.5, 10, 1, 1) (right heavy tail) and dPIN (10, 10, 1, 1) (similar to a Nor- 
mal distribution but with heavier tails) distributions, respectively. We as- 
sumed the same hyperparameters as in Example 1, (m, k, a, b, c a , eg, d a ,dp) = 
(0, 4, 1, 1,1,1, 1, 1). Table 1 shows the starting values 6q (ML estimates), pos- 
terior estimates E(9\y) and 95% credible intervals for the large parameters. 
We would like to point out the high variability in the intervals, especially 
if a is large. However, as it can be seen in Figure 3, both the frequen- 
tist and Bayesian approaches perform similarly when fitting the pdf to the 
histogram of the data. This indicates that, as pointed out by Reed and Jor- 
gensen (2004), when a or f3 are large, the density function approaches to 
the three parameters limit case /2 (2.3) or f\ (2.2), and, thus, there is small 
difference in the dPIN density function between multiple values of a or f3. 

To show the versatility of the dPIN model, we next consider two real data 
sets from the insurance and internet context, respectively. 

Example 3. The first data set has been analyzed in Beirlant et al. 
(1998) and Beirlant et al. (2004) and and can be found in http:// 
lstat.kuleuven.be/Wiley/. This contains 1668 claim sizes (expressed as a 
fraction of the sum insured) from a fire insurance portfolio provided by the 
reinsurance brokers Boels & Begaul Re (AON). The data concern claim 
information from office buildings. Next to the size of the claims, the sum 
insured per building was provided. The Gibbs sampler was run under the 
same conditions as in the simulated-data example and posterior estimates 
E(0\y) = (0.51,4.99,7.78,0.76) were found. Note that the posterior estimate 
for a indicates a clear long tail. Figure 4 shows the fit to the histogram 
of the data in log-scale of the dPIN model (solid line) in comparison with 
the fit provided by a mixture of Pareto distributions (dashed line), where 
the number of the components in the mixture, k, may change at each iter- 
ation. Estimation for the k-Par distribution was undertaken in Ramirez, 
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Table 1 

Starting values (MLE) and posterior estimates for the considered simulated data SI., S2. 
and S3, in Example 2, where a or/and /3 take large values. Also, credible intervals for 

the large parameters are shown 





SI: dPlN(10,0.5, 1,1) 


S2: dPlN(0.5, 10, 1, 1) 


S3: dPlN(10, 10, 1,1) 


00 = 0MLE 

E(0\y) 

C a 

Cp 


(4.34,0.50,0.83,1.05) 
(22.74,0.49,1.09,1.07) 
[1.5811,30.4547] 


(0.56,4.53,1.28,1.04) 
(0.55,3.81,1.32,1.02) 

[1.7914,9.7827] 


(4.67,5.53,0.92,0.91) 
(40.81,1.8921,1.52,0.81) 
[3.0217,50.9491] 
[1.4307,3.9651] 




y 



Fig. 3. Fitted pdfs using the ML values (solid line) and the posterior estimates from the 
Bayesian approach (dashed line), for data sets SI., S2. and S3, in Example 2. 

Lillo and Wiper (2008), and as it was commented in Section 1, here the 
Pareto (or mixture of Pareto) distribution fails to capture the body of the 
distribution. In addition, the Bayesian approach considered in Ramirez, Lillo 
and Wiper (2008) is more time consuming than the Gibbs sampler developed 
here. That algorithm was based on a Birth-Death MCMC method, where at 
each iteration a Metropolis-Hastings step is carried out. The Gibbs sampler 
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Fig. 4. Histogram and fitted pdf in Example 3, for the Aon data set (claim sizes in afire 
insurance portfolio) in log-scale, under the dPIN model (solid line) and mixture of Pareto 
components model (dashed line). 

has a number of well-known advantages over standard Metropolis-Hastings 
samplers. For example, the Gibbs sampler requires no tuning, which for 
Metropolis-Hastings algorithms can be time consuming — especially for long 
data sets where the algorithm takes longer to run. 

Example 4. The second real example that we consider is from the tele- 
traffic context. It can be found in the Internet Traffic Archive (BC trace), 
http://www.sigcomm.org/ITA/, where 4 million packet traces of LAN and 
WAN traffic seen on an Ethernet at the Bellcore Morristown Research and 
Engineering facility are recorded. The considered trace, BC-pAug89, began 
at 11.25 on August 29, 1989, and ran about 3142 seconds (until 1 million 
packets had been captured). The measurement techniques in making the 
traces are described in Leland and Wilson (1991) and are a subset of those 
analyzed in Leland et al. (1994). The data set analyzed here consists of the 
measured transferred bytes/sec within the 3142 consecutive seconds. 

We applied the Gibbs algorithm and found posterior estimates E{9\y) = 
(8.59,4.52,11.83,0.59). The mode of this data set is not close to zero, as 
can be observed in Figure 5, and, thus, the mixture of Pareto distributions 
shows a poor performance. Here again, the dPIN model performs well, not 
only capturing the tail but also the body of the set, as can be seen in the 
same figure. 

Thus, from our experience the dPIN distribution has two advantages over 
the k-Par for fitting heavy-tailed data: first, it is able to capture both the 
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Fig. 5. Histogram and fitted pdf in Example 4, for the teletraffic data set (number of 
bytes per second), under the dPIN model (solid line) and mixture of Pareto components 
model (dashed line). 



tail and body of the distribution, and second, the estimation procedure for 
fitting the dPIN distribution is faster computationally than that proposed 
in Ramirez, Lillo and Wiper (2008), for the k-Par density. 



3. Inference for the dPIN/M/l queueing system. In this section we 
shall consider the dPIN distribution as a model for the arrival process in a 
single-server queueing system with independent, exponentially distributed 
service times. The next section reviews this queueing system, denoted as 



dPIN/M/l. 



3.1. The dPIN /M/\ queueing system. The dPIN/M/l system is an ex- 
ample of the G/M/l queueing system, whose properties are well known [see 
Gross and Harris (1998)]. In particular, for the dPIN /M/l system with pa- 
rameters 6 = (a, /?, v, t 2 ), standard results for G/M/l queues imply that the 
mean interarrival time does not exist if a < 1. In this case, the queueing sys- 
tem is automatically stable whatever the service rate /i [that is, E(S) = l//i, 
where S denotes the service time]. Otherwise, the traffic intensity is given 
by 



(3.1) 



(q-lXff + l) 
fia(3e»+ r2 / 2 



INFERENCE FOR DOUBLE PARETO LOGNORMAL QUEUES 



13 



If the system is stable (p < 1), then the steady-state probability for the 
number of customers Q in the system just before an arrival, the stationary 
time W q spent queueing for service, and time W spent in the system are 

P(Q = n) = (1 - r )r£ for a11 n ^ N, 

P(W q <x) = l-r e-^ 1 - ro)x , 

P(W <x) = l-e-^ 1 - ro)x , 

where tq £ (0, 1) is the unique real root of the equation 

(3.2) r = /*(Ml-r )), 

and /*(•) is the Laplace-Stieltjes transform of the interarrival-time density 
function /(•) defined as 

roc 

f*(s)= e~ sx f{x)dx forRe(s)>0. 
J o 

However, the Laplace transform of the dPIN distribution is analytically in- 
tractable so that the standard techniques for finding the root of Equation 
(3.2) cannot be applied. Thus, an alternative approach to obtaining the 
steady state distributions is needed. The next section outlines such an ap- 
proach. 



3.2. A variety of the transform approximation method. The transform 
approximation method (TAM) was developed informally by Harris and Mar- 
chal (1998) and Harris, Brill and Fischer (2000) for the case of approximating 
the Laplace transform of the single parameter Pareto distribution and was 
later extended by Shortle et al. (2004). Here we describe the approach in 
the case of the dPIN distribution. To approximate the Laplace transform 
f*(s) of the distribution of a random variable X , the basic algorithm is as 
follows: 

1. Pick a set of N probabilities, pi, <p\ < ■ ■ ■ < pn < 1. 

2. Find the quantile ti of order pi, P(X <U) = pi. 

3. Assign to each point ti the probability 

P1+P2 



Wl 



2 



Pi+l—Pi-l r ■ r, AT ^ 

Wj = lor i = 2, . . . , JS — 1, 



W N 



2 

p N -l+p N 



4. Approximate the Laplace Transform f*(s) by f^(s) = Yli=i w i e 
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For the dPIN case and once the probabilities pi have been selected, the 
quantiles in step 2 are approximated numerically by Newton-Raphson, with 
initial values obtained from the empirical distribution function of the data. 

Harris, Brill and Fischer (2000) and Shortle et al. (2004) consider dif- 
ferent alternatives for the defining probabilities pi, although, as they point 
out, the choice of the optimal probabilities is an open question. The natural 
approach, known as uniform TAM or U-TAM, is to define uniform probabil- 
ities, Pi = (i — 1)/N. However, this approach leads to poor approximations 
in the tail of the distribution. An alternative algorithm applied in Shortle 
et al. (2004), which better captures heavy-tailed behavior, is the geometric 
or G-TAM algorithm which sets pi = 1 — q l , for q 6 (0, 1). But even when 
q — > 0, few quantiles are selected from the body of the distribution and a 
poor approximation of this part may be obtained with this approach. 

We have found that a combination of both algorithms works better than 
applied separately. We used the U-TAM algorithm to obtain a proportion r 
of percentiles from the body of the distribution and the G-TAM algorithm 
is used to find the other (1 — r) proportion of percentiles covering the heavy 
tail. We consider that the body of the distribution is defined by those per- 
centiles ti such that P(X < ti) < P(X < E[X]), in the case that E[X] exists 
(otherwise, we use the median). Other alternatives (with larger quantiles) 
may be used, but in practice we have found that it makes little difference. 

Formally, if r denotes the proportion of percentiles before i£[-X], and q 
is the geometric rate, then (r, q) <G {r min , r max } x {q min , q max } form 
a grid where the optimal value (r*,q*) is chosen so that the TAM mean 

(Eili^*i) (or the TAM median: t a /££=i™i < 0.5 and £"=i"^ > 0.5) 
matches the mean (or median) of the original distribution. In our examples 
we have found that a grid of size 8 x 17 is enough to get a distance less than 
10 -3 between the TAM mean/median and the theoretical mean/median. 
The proposed methodology satisfies the conditions of Theorem 1 in Shortle 
et al. (2004) so that convergence of f^(s) to f*(s) is assured as N — > oo. 

3.3. Bayesian estimation of the dPIN /M/l queueing system. Given the 
prior distributions and a sample of dPIN distributed interarrival data, we 
have seen that the Gibbs algorithm can be used to produce a sample of values 
0® = («(*),/?(*), !,(*), T (*) ) for t= 1, . . . , T from the posterior distribution of 
the dPIN parameters. 

Supposing now that the service rate, /x, is known, then it is straightforward 
to estimate the probability that the system is stable, 

1 T 

(3-3) P(p<l|y) = -^/(pW<l), 

t=i 

where p® is the value of p calculated from Equation (3.1) setting 6 = 6® 
and /(■) is an indicator function. Given that this probability is high, then 
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for each set 9^ of generated parameters such that < 1, the root 7q 
can be generated using (3.2) and the TAM and, therefore, the conditional 
posterior distributions of queue size and waiting times, given stability, can 
be estimated by Rao Blackwellization, that is, by simply averaging over 
the parameters satisfying the stability condition. Thus, for example, the 
posterior distribution of queue size P(Q = n\y) is estimated by 

1 S 

-J2P(Q = n\0 {s \»), 

s=l 

where 6^ , . . . , 0^ is the set of parameters satisfying the stability condition. 

One point to note, however, is that, as commented in Wiper (1997), the 
means of the fitted equilibrium queue size and waiting time distributions 
do not exist. This is a typical feature for Bayesian inference in G/M/- or 
M/Gj '• queueing systems. Thus, if posterior summaries of these distributions 
are required, it is preferable to use the median and quantiles. 

When the service parameter is unknown, then, given an independent sam- 
ple of service time data, conjugate inference for the service rate can be car- 
ried out as in, for example, Armero and Bayarri (1994). For a Monte Carlo 
sample, pP~\ ■ ■ ■ from the posterior distribution of the service rate, the 
traffic intensity may be estimated by calculating p"> given (0W,/i(*)) and 
averaging as in (3.3). In order to condition on the existence of equilibrium, 
only those parameter sets (6^\p^) such that < 1 are retained. 

3.4. Application to internet traffic analysis. Internet traffic data has 
lately become a wide field of study and numerous works have character- 
ized it as having some unusual statistical properties such as self similarity 
and heavy tails; see, for example, Willinger, Paxson and Taqqu (1998). In 
particular, as shown in Paxson and Floyd (1995), internet arrival traffic can- 
not be well modeled by a Poisson process. As an alternative, heavy-tailed 
distributions can be considered. 

Figure 6 shows the histogram of a set of interarrival times (in seconds) 
of a trace of 1 million ethernet packets, derived from BC-pAug89 in the 
Internet Traffic Archive (described in Example 3 of Section 2.3). The first 
(according to the outcome) 50,000 interarrival times (in sec) are analyzed 
here. Superimposed (in solid line) is the fitted dPIN density generated using 
the Bayesian algorithm described in Section 2. Also superimposed (dashed 
line) is the fitted Pareto density. In this example the Pareto distribution 
captures the tail of the distribution but has a poorer performance in the 
body of the distribution. It can be seen in Ramirez, Lillo and Wiper (2008) 
that a mixture of two Pareto components provides a good fit of this data 
set, however, the high computational cost of that algorithm makes this one 
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Table 2 



Probability 


of eq 


uilibrium and traffi 


s intensity. 


When n is large (faster 


service on 




average), the probability 


of stability 


of the system increases 




M 




E{S) 




P(p<l|y) 


E(p|y) 


1500 




0.0006 




1 


0.2616 


1000 




0.001 




1 


0.3923 


500 




0.002 




1 


0.7844 


400 




0.0025 




1 


0.9798 


395 




0.002531 




0.8257 


0.9946 


394 




0.002538 




0.7869 


0.9969 


393 




0.002544 




0.6115 


0.9979 


392 




0.002510 




0.4562 


1.0008 


391 




0.002550 




0.4284 


1.0040 


390 




0.002564 




0.2519 


1.0065 


385 




0.002597 







1.0194 



based on the dPIN distribution preferable. The posterior mean parameter 
estimates for the dPIN model were E(0\x) = (2.15,1.07,-6.00,0.36). 

Now we shall consider the queueing aspects. Given the dPIN arrival pro- 
cess, we shall assume that arrivals are processed by a single server with 
exponentially distributed service times with rate //. Table 2 shows the pos- 

0.5 r- 

0.45- 

0.4 
0.35- 

0.3- 
5 0.25 - 

0.2 
0.15- 

0.1 - 
0.05- 

<y- 

-14 -12 -10 -8 -6 -4 -2 

y = log(x) 

Fig. 6. Histograms and fitted pdf for the internet data (50,000 real interarrival times) in 
log-scale, under the dPIN model (solid line) and Pareto model (dashed line). 
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Table 3 

Predictive system size distribution just before an arrival for the internet data set, for an 
assortment of service rates fi. As expected, for faster services (large \x) the probability of 
an empty system is larger than for slower services 





P(Q = O) 


P(Q = 1) 


P(Q = 2) 


P(Q = 3) 


1500 


0.3167 


0.2161 


0.1475 


0.1008 


1000 


0.2813 


0.2019 


0.1449 


0.1042 


500 


0.2182 


0.1703 


0.1330 


0.1039 


400 


0.1955 


0.1570 


0.1260 


0.1014 


395 


0.1948 


0.1569 


0.1260 


0.1013 


394 


0.1946 


0.1565 


0.1259 


0.1013 



terior probability of equilibrium (third column) and the expected value for 
the traffic intensity (fourth column) for an assortment of values of \i [the 
expected service time is E(S) = 1/ fj\. From this table, it is clear that there 
is a high probability that the system is stable (that is, no congestion occurs) 
for values of /i greater than 394. Figure 7 depicts the fitted system waiting 
time W, and queue waiting time W q , distributions for values of \i greater 
than 400. Table 3 illustrates the distribution of the number Q of clients in 
the system in equilibrium. We can see that as the service rate increases (i.e., 
the service is faster), then the median queueing and system waiting times 
and the number of clients in the system decrease, as would be expected. 

In this example we have also compared the queueing results obtained with 
the dPIN model with those obtained from the queueing systems Pareto/M/1 
and M/M/l. Different estimates of the system and queue waiting time dis- 




FlG. 7. Predictive system and queue waiting times distributions for the internet data set 
for an assortment of service rates (o: = 400, *: fi = 500, +: fj, = 1000 and □: y,= 1500 ). 
As expected, when the service is faster (\x increases) , then the probability of waiting less 
than a short time is larger. 
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tributions under the different queueing models were obtained. The fitted 
system size distribution just before an arrival among these different queues 
also varies, for example, the probability that the system size is larger than 
2 or than 3, P(Q > 2), P(Q > 3) is larger with the dPIN model than with 
the Pareto or Exponential models. On the contrary, the values P(Q > 0), 
P(Q > 1) are smaller with the dPIN model than with the other ones. 

4. The M. I dPIN / 1 queueing system and ruin probabilities. In this sec- 
tion we consider the M/dPIN /l queueing system, with independent, expo- 
nentially distributed interarrival times and dPIN service times, and show 
how the Bayesian approach to estimate the dPIN can be used to estimate 
the probability of ruin from actuarial data. 

4.1. The M/dPlN/1 queueing system. The general properties of the 
M/G/l queueing system are well known; see, for example, Gross and Harris 
(1998). In particular, if the service time S is assumed to follow a dPIN dis- 
tribution with = (a, fijVjT 2 ), then, if a < 1, E(S) = oo and the queueing 
system is never stable, whatever the interarrival rate A. When a > 1, the 
traffic intensity is given by 

\aPe u+r2 / 2 
P "(a-l)(/3 + l)' 

The Laplace transform W*(s) of the equilibrium waiting time in the queue 
is related to the Laplace transform B*(s) of the (dPIN) service time by 

where W q (t) is the distribution function of the waiting time. In order to 
obtain the distribution function of the waiting time W q (t), we first apply 
the TAM to approximate B*(s) as earlier. Second, we can use a standard 
numerical approach to invert the Laplace transform, W*(s); see, for example, 
Shortle, Fischer and Brill (2007) for a review. In this case, we apply the 
recursion method by Fischer and Knepley (1977). 

4.2. Application to fire insurance claims. In an insurance context, it is 
often assumed that claim sizes, Cj, are independent and identically dis- 
tributed heavy-tailed random variables; see, for example, Rolski et al. (1999). 
Here, we shall assume that claim sizes can be modeled as dPIN random 
variables. Often, it is also supposed that the interclaim times, Tj, are inde- 
pendent, exponentially distributed variables with rate A. Let u denote the 
initial reserve of an insurance company and let r be the rate at which pre- 
mium accumulates. Then, the company's wealth, or risk portfolio at time t, 
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is 

N(t) 

R(t) = u + rt- J^Ci, 

i=l 

where N(t) = sup(n : Yl7=i ^ <t) is & Poisson counting process with rate A. 

Clearly, the insurance company will be interested in the probability that 
they may eventually be ruined, given their initial capital and premium rate, 
that is, 

ip(u,r) = P(R(t) < for some t > | initial capital u, premium rate r). 

(4.1) 

If the mean claim size does not exist, then eventual ruin is certain. Oth- 
erwise, we can define the traffic intensity of this system as p = \E[Ci]/r and 
it is well known that ruin is certain if p>l. In the case that p < 1, then 
in, for example, Prabhu (1998), it is shown that the ruin probability can be 
computed as the steady state probability that the waiting time exceeds u/r 
in a M/G/l queueing system, where the interarrival time and service time 
distributions are the same as the distributions of Tj and Ci/r respectively. 
Table 4 shows this duality. Thus, estimating the M/dPlN/1 queue allows 
us to estimate the probability of ruin where the claims sizes are assumed to 
follow a dPIN distribution. 

Note that by scaling appropriately, it can be assumed without loss of 
generality that the premium rate, r, is equal to 1 and we shall do this from 
now on, writing ip(u) for the ruin probability of Equation (4.1). 

Assuming the M/dPlN/\ model and given some initial reserve u and 
claim arrival rate A and a sample of claim sizes, then the posterior parameter 
distribution of the dPIN claim size distribution can be estimated using the 
Bayesian approach as outlined in Section 2 and this can be combined with 
the TAM and recursion algorithms to estimate the ruin probability. 

To illustrate this approach, we consider data treated in Beirlant and 
Goegebeur (2003) and Beirlant et al. (2004) representing 9181 fire claims val- 
ues for the period 1972-1992 from a Norwegian insurance portfolio. Together 

Table 4 

Duality between the probability of ruin in a risk theory context 
and the M/G/l queueing sytem with steady-state queue 
waiting time distribution W q 



Queueing theory 


Risk theory 


Interarrival times 


Interclaim times 


Service times 


Claim sizes 


r(w q > u) 


Probability of ruin 


for a M/G/l 


with initial reserve it 
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with the year of occurrence, the values (xlOOO Krone) of the claims are 
known. They can be found in http : //ucs . kuleuven.be/Wiley/ index . html. 
The left panel of Figure 8 shows the data in log-scale (values of the claims) 
and the Bayesian dPIN fit. The right panel of Figure 8 illustrates the log- 
transformed fitted Pareto (dotted line) and Exponential (dashed line) mod- 
els to this data set. Again, the Pareto model does not capture the body of 
the distribution; the Exponential fit is even worse, it captures neither the 
body, nor the tail. 

Assuming that the system is stable, we can now estimate the ruin prob- 
ability for different interclaim rates and initial reserves. In this case, the 
expected claim size, conditional on this existing (i.e., that a > 1), is approx- 
imately 2915, which implies that in order to avoid extremely high probabil- 
ities of ruin, we should typically consider plausible values of A to be below 
1/2915. Figure 9 depicts the posterior probability of ruin, E{ip(u) |data), for 
a grid of values of different average interclaim times, 1/A, and various ini- 
tial reserve levels, u. As would be expected, when both the initial reserve 
u and the expected interclaim times 1/A are low, then the ruin probability 
increases. 

As we did for the dPIN/M/l queueing system with the teletraffic data 
set, given theses claim sizes, we have also compared the performance of the 
M/dPIN /l queue with the M / Pareto /l and M/M/l queueing system, as- 
suming a rate A = 1/4000. When fitting a Pareto distribution to the data 
with a Bayesian approach, it was found that a posteriori, the sampled pa- 
rameters of the Pareto distribution led to a lack of moment of order one, 
indicating that, since E(S\y) = oo, then the corresponding M / Pareto /l sys- 
tem is not stable, given the data. For the M/M/l model something similar 




y=log(x) y=log(x) 

Fig. 8. Histograms and fitted pdf for the Norwegian data (claim sizes) in log-scale, under 
the dPIN (left panel, solid line), the Pareto (right panel, dotted line) and Exponential (right 
panel, dashed line) models. 
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8000 



Fig. 9. Probabilities of ruin (z-axis) for the Norwegian data insurance company, for an 
assortment of initial reserves u (values from to 4 x 1Q 4, ) and mean interclaims times 
1/A. As would be expected, when the initial reserve is low and the claims occur frequently 
on average (low values ofl/X), then the probability of ruin increases. 

was found: 1 < p^> < oo for most of the iterations, and, thus, the posterior 
probability that the system is stable was very low. Thus, we could not pre- 
dict the probability of ruin, under these models. Finally, the same comments 
as in Section 3, concerning the estimation of the arrival rate A (interclaim 
times rate) when it is considered as an unknown parameter, can be also 
applied here. 

5. Conclusions. In this work we have developed Bayesian inference for 
the double Pareto lognormal distribution and have illustrated that this 
model can capture both the heavy-tail behavior and also the body of the dis- 
tribution for real data examples. Bayesian inference was implemented with 
the Gibbs sampler, although, since 6 is only 4 dimensional, several alterna- 
tives exist and were attempted. The use of importance sampling was difficult 
because of a lack of good distributions for the initial sample that avoided 
degeneracy. A block Metropolis algorithm using a multivariate normal pro- 
posal, with covariance matrix estimated by maximum likelihood, was also 
attempted but exhibited poor mixing for r and slower computation time. 
This suggests that the Gibbs procedure should be preferred. 

Second, we have combined this approach with techniques from the queue- 
ing literature in order to estimate posterior equilibrium distributions for the 
dPIN /M/l and M/dPlN/1. To do this, we have adapted the transform ap- 
proximation method, in order to estimate the Laplace transform of the dPIN 
distribution and the waiting time distribution in the M/dPlN/1 system. 
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Finally, we have illustrated this methodology with real data sets, estimat- 
ing first waiting times and congestion in internet and computing the prob- 
ability of ruin in the insurance context, making use of the duality between 
queues and risk theory. Comparisons with the M/M/l, Pareto/M/1 and 
M/Pareto/1 have been also carried out. Differences among these queueing 
systems, especially when the service process is heavy-tailed, were found. 

A number of extensions are possible. First, we could extend our results 
to the case of a multiple number of servers, that is, to the dPIN / M / c and 
Mj dPIN jc queueing systems or to finite capacity systems. It would be also 
interesting to study the optimal control of the systems, that is, when to open 
or close the queue and which is the optimum number of servers, following 
the lines of Ausin, Lillo and Wiper (2007). 

Also, in this article, we have just considered semi-Markovian queueing 
systems where either the service or interarrival times were exponential. An 
extension is to explore more general distributions, in particular the so-called 
phase- type distributions. 

It would be interesting, too, to consider a nonparametric estimate of the 
Laplace transform from data, so that a parametric specification of the dis- 
tribution entirely would be avoided. This has been suggested by one of the 
referees, and will be considered in future work. 

Finally, in terms of the application to insurance, it would also be impor- 
tant to explore the estimation of transient or finite time ruin probabilities 
which are also of interest to insurers. 

All Matlab codes and real data utilized in the examples are available in 
the supplemental material Ramirez et al. (2010). 

APPENDIX A: PROOF OF PROPOSITION 1 

For ease of notation, we write z\y, a, /3, u, r 2 as z\y throughout this proof: 

fy,z{y,z) 



fz\y{Av) 



My) 
fz{z)fw{y - z) 

Mv) 

1 1 ( z — v\ aft 



f Y {y)r V T Ju + P 

x [exp(/3(y - z))I z > y + exp(-a(y - z))I z<y ] 

er» 2 /^ 2 ) a/3 
Tfr{y) a + /3 



+ exp ( - ^ [z 2 -2z{u + t 2 o) + 2r 2 ay} J I z<y 
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Tfv{y) a + p 

exp (" ~h {{z ~ {v ~ r2/3))2 " 2r2/3y ~ {v ~ r2/3)2] ) Iz ^ y 



exp( -^[{z-{v + T z a)Y + 2T z ay-{v + T z aY] )I Z< 



rfviy) a + P 

{ >h-(v-W^ exp(-^(, - („ - r 2 /3)) 2 ) 
+ e „ Qy+(i , +r2Q)2/2r2 eX p(-^(z - („ + r 2 a)) 2 )/ ; 



e -" 2 /(^ 2 ) a/3 
My) a + /3 



+ e 



T<$> c {yP)' 

- a y + (u+T 2 a) 2 /2r 2 $ , y a\ </>( z ° 



1 a/3 
fv(y)a + !3 



(2ft,-2„/?+r 2 /? 2 )/2g C / x Hz 13 ) j 
, (-2ay+2v a +T 2 a 2 )/2frC ( \ <H^) r 

+ e ^ [ya) T $(y a ) z<y 



1 a/3 ,( V ~ v 



fy{y)a + P V r 

Hy^ T&(y(>) z - y+ Hy a ) ^ c (y Q ) z<v . 



which gives the conditional density 

fz\ y (z\y) = R{yp) 



<f>(z a ) 



r$ c (yP) Iz - y + R{ya) T ^(y«) Iz<y ) / Wya) + R{y ^- 



APPENDIX B: PROOF OF PROPOSITION 2 



Since 



W = E 1 -E 2 where E 1 ~£(a) and E 2 ~£(P), 
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then, the distribution of 2?i|W is 



/si|vr( e i 



■w 



fw(w) 
fE u E 2 (ei,ei - w) 
fw{w) 

■fei(ei)/£ 2 (ei - w) 
fw{w) 

0, if e\ < max{w, 0}, 



ate 



-aei 



(a(3/(a + P))[e^I w<0 + e~ aw I w > 
(a + /3)e"( Q +' 3 ) ei 



for ex > max{w, 0} 



T 4- f>~( a +P) w T 

A w<0 \ ^ lr , 



for ei > max{u>,0}. 



w>0 



APPENDIX C: PROOF OF PROPOSITION 3 
Note first that 

P(w\ >0,...,w n > 0\y,0) = P(z\ <yi,...,z n < y n \y,0) 



n*(^)>o 

i=i 



for any set y and where $ is the standard normal cumulative distribution. 
Therefore, 

P(wi >0,...,w n > 0|y) = f P( Wl >0,...,w n > 0|y, 0)/(0|y) dG > 

for any y. Similarly, P{w\ < 0, . . . , uu n < 0|y) > for any y. 
Now consider the posterior distribution of a,/3|w, 

/(a,/3|w)a/(w|a,/3)/(a,/3) 

oc (^ ^^ j exp ^2 w iH w i < 0)1 ex P ^2 > °)^ 

x /(a,/3)- 

In the case that all to, < 0, then when a — > oo, for any given /3, 

/(Q|/3,w)a/(a,/3|w)^c(/3)/(a|/3) 
for some c(/3) > 0. Equally, if all Wi > 0, then when /3 — > oo, for any given a, 

/(/3|a,w)oc/(a,/3|w)^d(a)/(/3|a) 
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for some d(a) > 0. Therefore, if f(a\/3) da is divergent for any a > 0, 
then we have immediately that when a —> oo, /(a|w, (3) — > c(/3)f(a\/3), which 
implies that the posterior distribution of a is improper and similarly in the 
case of an improper prior for (3\a. 

Acknowledgments. The authors are grateful to three anonymous review- 
ers for their detailed and insightful comments on an earlier version. 

SUPPLEMENTARY MATERIAL 

Supplement: Matlab Toolbox (DOI: 10. 1214/10- AOAS336SUPP; .zip). 
The Matlab toolbox performs Bayesian estimation for the double Pareto 
Lognormal (dPIN) distribution, and for the queueing systems dPIN/G/l 
and M/dPlN/1. 
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